# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2021, PaddleNLP
# This file is distributed under the same license as the PaddleNLP package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2022.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PaddleNLP \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2022-03-18 21:31+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.9.0\n"

#: ../advanced_guide/model_compression/distill_lstm.rst:2
msgid "由BERT到Bi-LSTM的知识蒸馏"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:6
msgid "整体原理介绍"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:8
msgid ""
"本例是将特定任务下BERT模型的知识蒸馏到基于Bi-LSTM的小模型中，主要参考论文 `Distilling Task-Specific "
"Knowledge from BERT into Simple Neural Networks "
"<https://arxiv.org/abs/1903.12136>`_ \\ 实现。整体原理如下："
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:11
msgid "在本例中，较大的模型是BERT被称为教师模型，Bi-LSTM被称为学生模型。"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:13
msgid "小模型学习大模型的知识，需要小模型学习蒸馏相关的损失函数。在本实验中，损失函数是均方误差损失函数，传入函数的两个参数分别是学生模型的输出和教师模型的输出。"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:15
msgid ""
"在论文的模型蒸馏阶段，作者为了能让教师模型表达出更多的“暗知识”(dark "
"knowledge，通常指分类任务中低概率类别与高概率类别的关系)供学生模型学习，对训练数据进行了数据增强。通过数据增强，可以产生更多无标签的训练数据，在训练过程中，学生模型可借助教师模型的“暗知识”，在更大的数据集上进行训练，产生更好的蒸馏效果。本文的作者使用了三种数据增强方式，分别是："
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:17
msgid "Masking，即以一定的概率将原数据中的word token替换成 ``[MASK]`` ；"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:19
msgid "POS—guided word replacement，即以一定的概率将原数据中的词用与其有相同POS tag的词替换；"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:21
msgid "n-gram sampling，即以一定的概率，从每条数据中采样n-gram，其中n的范围可通过人工设置。"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:26
msgid "模型蒸馏步骤介绍"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:28
msgid ""
"本实验分为三个训练过程：在特定任务上对BERT进行微调、在特定任务上对基于Bi-LSTM的小模型进行训练（用于评价蒸馏效果"
"）、将BERT模型的知识蒸馏到基于Bi-LSTM的小模型上。"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:31
msgid "1. 基于bert-base-uncased预训练模型在特定任务上进行微调"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:33
msgid ""
"训练BERT的fine-tuning模型，可以去 `PaddleNLP "
"<https:github.com/PaddlePaddle/PaddleNLP>`_ 中\\ 的 `glue "
"<https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/benchmark/glue>`_"
" 目录下对bert-base-uncased做微调。"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:36
msgid ""
"以GLUE的SST-2任务为例，用bert-base-"
"uncased做微调之后，可以得到一个在SST-2任务上的教师模型，可以把在dev上取得最好Accuracy的模型保存下来，用于第三步的蒸馏。"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:40
msgid "2. 训练基于Bi-LSTM的小模型"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:42
msgid ""
"在本示例中，小模型采取的是基于双向LSTM的分类模型，网络层分别是 ``Embedding`` 、``LSTM`` 、 带有 ``tanh`` "
"激活函数的 ``Linear`` 层，最后经过\\ 一个全连接的输出层得到logits。``LSTM`` 网络层定义如下："
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:50
msgid "基于Bi-LSTM的小模型的 ``forward`` 函数定义如下："
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:66
msgid "3.数据增强介绍"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:68
msgid ""
"接下来的蒸馏过程，蒸馏时使用的训练数据集并不只包含数据集中原有的数据，而是按照上文原理介绍中的A、C两种方法进行数据增强后的总数据。 "
"在多数情况下，``alpha`` 会被设置为0，表示无视硬标签，学生模型只利用数据增强后的无标签数据进行训练。根据教师模型提供的软标签 "
"``teacher_logits`` \\ ，对比学生模型的 ``logits`` "
"，计算均方误差损失。由于数据增强过程产生了更多的数据，学生模型可以从教师模型中学到更多的暗知识。"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:72
msgid "数据增强的核心代码如下："
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:107
msgid "4.蒸馏模型"
msgstr ""

#: ../advanced_guide/model_compression/distill_lstm.rst:109
msgid ""
"这一步是将教师模型BERT的知识蒸馏到基于Bi-LSTM的学生模型中，在本例中，主要是让学生模型（Bi-"
"LSTM）去学习教师模型的输出logits。\\ 蒸馏时使用的训练数据集是由上一步数据增强后的数据，核心代码如下："
msgstr ""

